In silico process for selecting protein formulation excipients

ABSTRACT

The invention relates to an in silico screening method to identify candidate excipients for reducing aggregation of a protein in a formulation. The method combines computational molecular modeling and molecular dynamics simulations to identify sites on a protein where non-specific self-interaction and interaction of different test excipients may occur, determine the relative binding energies of such interactions, and select one or more test excipients that meet specified interaction criteria for use as candidate excipients in empirical screening studies.

BACKGROUND OF THE INVENTION

Lyophilized and liquid formulations of a therapeutic protein representan increasing percentage of pharmaceutical products that are approvedeach year. The excipients in such biological products not only need tobe suitable for use in an injectable formulation, they must stabilizethe protein against degradation that can occur during preparation andstorage of the formulated product. While various mechanisms ofdegradation have been identified, aggregation of protein molecules byprotein self-association is probably the most common and yet the leastunderstood mechanism. The formation of protein aggregates in abiological product can result in reduced efficacy and increased risk toelicit an immune response against the therapeutic protein.

The process of selecting formulation excipients that will mitigateaggregation typically involve high-throughput empirical screening of alarge number of excipients and other formulation conditions, a processwhich consumes a significant amount of time and material. As such, thereis a need to streamline this empirical process by developing in tools topredict whether a particular protein is prone to aggregation, toidentify what regions of the protein are involved in aggregation andwhich excipients are most likely to mitigate such aggregation ifincorporated into a candidate formulation.

SUMMARY OF THE INVENTION

The present invention helps address this need in the formulation art byproviding an in silico excipient screening approach. This approachcombines computational molecular modeling and molecular dynamicssimulations to identify sites on a protein where non-specificself-interaction and interaction of different test excipients may occur,determine the relative binding energies of such interactions, and selectone or more test excipients that meet specified interaction criteria foruse as candidate excipients in empirical screening studies.

Thus, in one aspect, the invention provides an in silico screeningmethod to identify candidate excipients for reducing aggregation of aprotein in a formulation. The method comprises the following steps:

-   -   a) obtaining a three-dimensional (3D) structure of the protein        and a 3D structure of at least one test excipient, wherein the        3D protein structure and the 3D excipient structure are at the        same level of resolution;    -   b) selecting at least one region of the 3D protein structure to        probe for potential sites of non-specific inter-molecular        interactions between monomers of the protein and between        monomers of the protein and at least one molecule of the test        excipient;    -   c) protonating the selected protein region to a desired pH;    -   d) conducting a first probe-protein docking simulation over the        entire surface area of the selected protein region, using as a        first probe the 3D protein structure, to identify a set of one        or more protein-protein docking sites whose docking scores in        total equal −3 kcal/mol or lower;    -   e) conducting a second probe-protein docking simulation over the        entire surface area of the selected protein region, using as a        second probe the 3D excipient structure, to identify any        excipient-protein docking sites and classifying for further        analysis as a putative protein-excipient complex each identified        site that has a docking score that is −3 kcal/mol or lower,        wherein each of the first and second probe-protein docking        simulations may be conducted in either order or simultaneously;    -   f) conducting a third probe-protein docking simulation over the        entire surface area of each protein-excipient complex classified        in step (e), using as a third probe at least one molecule of the        3D protein structure, and at all orientations of the protein, to        identify each docking site for the protein on the        protein-excipient complex that overlaps with any of the        protein-protein docking sites identified in step (d) and        classifying each overlapping site as a protein-excipient-protein        sandwich; and    -   g) selecting the test excipient as a candidate excipient to        reduce aggregation of the protein in the formulation if at least        one protein-excipient-protein sandwich classified in step (f)        has a docking score that represents a lower binding affinity        than the docking score for the protein-protein docking site in        the sandwich.

In yet another embodiment of the invention, the method further comprisesrepeating obtaining a 3D structure for a second test excipient that isat the same level of resolution as the protein and repeating steps (d)through (g) using the second excipient 3D structure instead of the firstexcipient 3D structure.

In a still further embodiment, the method further comprises examining invitro the ability of the selected candidate excipient(s) to reduceaggregation of the protein in the formulation.

In any of the above embodiments of the invention, the resolution of the3D structure for each of the protein and the test excipient may be atthe atomic level or at an intermediate level. In such embodiments, the3D structure may be pre-determined (i) computationally by a molecularmodeling algorithm or (ii) experimentally by X-ray crystallography,nuclear magnetic resonance or cryo-electron microscopy. In someembodiments, the 3D structure of the protein is pre-determinedexperimentally and is obtained from a protein structure database, suchas the Protein Data Bank (PDB).

In some embodiments where the 3D structure obtained for each of theprotein and the test excipient is at the atomic level, the methodfurther comprises conducting, before step (f), a molecular dynamicssimulation on each putative protein-excipient complex classified in step(e) and retaining for use in step (f) each protein-excipient complexthat has a binding free energy of '3 kcal/mol or lower.

In some embodiments, the protein of interest is a candidate antibodytherapeutic and its 3D structure at the atomic level is pre-determinedby a computational modeling process that comprises the steps of:

-   -   (a) providing amino acid sequences for framework regions (FR)        and complementarity determining regions (CDR) in a set of        antibody Fabs for which the 3D structure has been experimentally        determined;    -   (b) aligning, for a first FR in the candidate antibody and the        corresponding FR in each provided Fab, the amino acid sequences        to identify each Fab FR that shares at least 85% sequence        identity with the candidate antibody FR and selecting the 3D        structure of the Fab FR that has the highest sequence identity        with the candidate antibody FR for use as the structural model        for the first FR    -   (c) repeating step (b) for each FR in the candidate antibody;    -   (d) aligning, for a first CDR in the candidate antibody and        corresponding CDR in each provided Fab, the CDR amino acid        sequences and selecting, for use as the structural model for the        FR of interest, the 3D structure of the Fab CDR that has about        the same length as the candidate antibody CDR and is likely to        form a higher ordered structure;    -   (e) repeating step (d) for each CDR in the candidate antibody;    -   (f) grafting together the 3D structures of the selected Fab FRs        and Fab CDRs and mutating the FR and CDR amino acid sequences in        the grafted structure to be the same as in the candidate        antibody to derive a Fab structural model of the candidate        antibody;    -   (g) superimposing two copies of the Fab structural model onto        the structure of a full length IgG antibody with an Fc of the        same isotype as the candidate antibody;    -   (h) joining the two Fab copies with the Fc using a linker that        is modeled to provide the appropriate disulfide bonds and        thereby derive a full-length structural model of the candidate        antibody;    -   (i) performing a molecular dynamics simulation on the        full-length structural model that packs the side chains and        eliminates any clashes in the structure to generate an energy        minimized structure of the candidate antibody.

In any of the above embodiments, the method may be performed on anantibody and the first and second probe docking simulations areperformed for at least two antibody regions selected from the groupconsisting of all light chain CDRs, all heavy chain CDRs, each Fab, theFc region and the entire antibody.

In any of the above embodiments, the formulation may be a liquidformulation.

In any of the above embodiments, the test excipient(s) may be an aminoacid.

In other aspects, the invention provides (i) a machine-readable mediumfor carrying out the method of any of the above embodiments, comprisingmachine-readable instructions encoded thereon which, when executed by aprocessor, cause a machine having or linked to the processor to executethe method and (ii) a computer system comprising this machine-readablemedium and a user interface capable of receiving the 3D structures ofthe protein and excipients and user selected criteria applied in one ormore steps of the method.

BRIEF DESCRIPTION OF THE FIGURES

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

FIG. 1 is a graphical illustration of performing several steps in anembodiment of the in silico method of the invention showing: the 3Dstructure of an antibody as a monomer in its native state (Panel A); aresult of the first probe-protein docking simulation that identified anintermolecular docking site between a CDR of one antibody molecule andthe framework region of a another molecule of the antibody (Panel B); aresult of the second probe-protein docking simulation that identified anexcipient docking site at the same antibody CDR and has been classifiedas a putative protein-excipient complex (Panel C); and a result of thethird probe-protein docking simulation that identified a docking sitewhere a protein-excipient-protein sandwich could form with the potentialto disrupt protein self-association (Panel D).

FIG. 2 illustrates operability of the in silico screening method of theinvention using CNTO607, a mAb with known areas of self-association(Bethea, D. et al., Protein Engineering Design & Sectetion, Vol 25(10)2012), showing: (A) a ribbon diagram of a this mAb self-associating witha space filled model of the same mAb; (B) a ribbon diagram of the Fab ofthis mAb showing a CDR and an excipient interaction site in the sameregion as the CDR, with the excipient shown as a ball and stickstructure in yellow); (C) an enlarged view of the excipient-CDRinteraction site shown in (B); (D) ribbon diagram of a complex of twoCNTO607 molecules in the same conformation as depicted in panel (A),with the interaction (i.e., docking) site including a CDR on the leftmolecule and a framework region (FR) on the right molecule; (E) a ribbondiagram showing the same protein-protein complex shown in panel (D) andmesh surfaces for twenty amino acids (orange color) and theirinteraction sites on the CDR in the left molecule; and (F) an enlargedview of the CDR and amino acid mesh surfaces shown in panel (E).

FIG. 3 illustrates screening of 20 amino acids as test excipients forreducing aggregation of CNTO607 showing: (A) the protein-protein complexfrom panel (D) of FIG. 2 , with the dashed black circle indicating theselected region for analysis that includes the CDR-FR interaction site;(B) a graph of the binding affinities (i.e., docking scores) for 20amino acids in the selected region; and (C) a surface rendering of theCDR-FR interaction site for the 20 amino acids that corresponds to thebinding affinities shown in panel (B).

FIG. 4 illustrates the comparison of high affinity protein-proteininteractions for CNTO607 using an experimentally determined 3D crystalstructure (panels (A) and (B)) and a computationally determined 3Dstructure model (panels C and D), showing: (A) interaction of twoCNTO607 molecules at the CDR-FR interaction site; (B) an enlarged viewof the CDR-FR interaction site with the highest energy contacts betweentwo FR lysine residues in one molecule and two CDR aspartic acidresidues in the other molecule; (C) the same orientation of twointeracting CNTO607 molecules in which the highest docking score posehas been superimposed on the crystal structure orientation and (D) thehighest energy docking contacts of the same CDR and FR amino acidresidues as observed using the experimentally determined crystalstructure.

FIG. 5 illustrates the potential of an excipient-protein interaction(i.e., docking) to impact protein-protein interaction (i.e., docking)via binding of the excipient to the protein-protein docking site togenerate a protein-excipient-protein sandwich, withprotein-excipient-protein docking for a computationally determinedstructure (shown in red pose) compared to crystallographicprotein-protein interaction (depicted in blue ribbon diagram) and thereduction in the binding affinity (i.e., docking score) that is createdby the formation of a protein-excipient-protein sandwich.

FIG. 6 illustrates the detection of intermolecular association betweentwo molecules of a mAb that was not previously known to self-associate,showing in the left panel an interacting region between the Fc region ofthe full antibody and a CDR for the Fab of the antibody that had thehighest docking score (depicted with red circle) and showing in theright panel additional interaction sites between the full-lengthantibody and the Fab.

FIG. 7 illustrates protein-excipient docking sites for three differentmAbs that met a docking score requirement of −3 to −5 kcal/mol and wereidentified by conducting a probe-protein docking simulation over theentire surface area of the 3D structure for each mAb using as the probeone of twenty amino acids, with red meshes identifying the sites whereamino acid excipients interacted with Mab A (13 of the 20 amino acids),Mab B (all 20 amino acids) and Mab C (all 20 amino acids).

DETAILED DESCRIPTION OF THE INVENTION

As used herein, including the appended claims, the singular forms ofwords such as “a,” “an,” and “the” include their corresponding pluralreferences unless the context clearly dictates otherwise.

So that the invention may be more readily understood, certain technicaland scientific terms are specifically defined below. Unless specificallydefined elsewhere in this document, all other technical and scientificterms used herein have the meaning commonly understood by one ofordinary skill in the art to which this invention belongs.

All references cited herein are incorporated herein by reference to thesame extent as if each individual publication, patent, or publishedpatent application was specifically and individually indicated to beincorporated by reference.

The present invention may be used in screening test excipients for usein formulating any type of protein for which a model of itsthree-dimensional structure is available or may be generated. As usedherein, the term protein means any sequence of at least two amino acids(also referred to herein as “amino acid residues” or “residues”) joinedtogether by peptide bonds between carboxyl and amino groups of adjacentamino acids, regardless of length, post-translation modification,chemical modification of function. Typically, the protein is ofsufficient length to fold into a three-dimensional (3D) structure. Thus,the terms “protein”, “polypeptide”, and “peptide” are usedinterchangeably herein, unless otherwise apparent from the context inwhich the term is used. In particular, it is envisioned that the insilico screening method of the invention may be applied to cytokines,chemokines, enzymes, fusion proteins, hormones, immunoglobulins,antibodies, monoclonal antibodies (mAbs), and antigen binding fragmentsof mAbs, and antigens among other types of proteins intended fortherapeutic use. The protein may be a naturally-occurring orrecombinantly-produced protein, or may be chemically synthesized. Insome embodiments, the protein may be chemically conjugated to a polymer(e.g., a pegylated protein) or to a therapeutically active moiety (e.g.,an antibody-drug conjugate). The protein may incorporate unusual orunnatural amino acids.

In some embodiments, the protein to be formulated is an antibody orimmunoglobulin. As used herein, the term “antibody” refers to any formof antibody that exhibits the desired biological or binding activity.Thus, it is used in the broadest sense and specifically covers, but isnot limited to, monoclonal antibodies (including full length monoclonalantibodies), polyclonal antibodies, multispecific antibodies (e.g.,bispecific antibodies), humanized, fully human antibodies, chimericantibodies and camelized single domain antibodies. “Parental antibodies”are antibodies obtained by exposure of an immune system to an antigenprior to modification of the antibodies for an intended use, such ashumanization of an antibody for use as a human therapeutic.

In general, the basic antibody structural unit comprises a tetramer.Each tetramer includes two identical pairs of polypeptide chains, eachpair having one “light” (about 25 kDa) and one “heavy” chain (about50-70 kDa). The amino-terminal portion of each chain includes a variableregion of about 100 to 110 or more amino acids that is primarilyresponsible for antigen recognition. The carboxy-terminal portion of theheavy chain may define a constant region that is primarily responsiblefor effector function. Typically, human light chains are classified askappa and lambda light chains. Furthermore, human heavy chains aretypically classified as mu, delta, gamma, alpha, or epsilon, and definethe antibody's isotype as IgM, IgD, IgG, IgA, and IgE, respectively.Within light and heavy chains, the variable and constant regions arejoined by a “J” region of about 12 or more amino acids, with the heavychain also including a “D” region of about 10 more amino acids. Seegenerally, Fundamental Immunology Ch. 7 (Paul, W., ed., 2nd ed. RavenPress, N.Y. (1989).

The variable regions of each light/heavy chain pair form the antibodybinding site. Thus, in general, an intact antibody has two bindingsites. Except in bifunctional or bispecific antibodies, the two bindingsites are, in general, the same.

Typically, the variable domains of both the heavy and light chainscomprise three hypervariable regions, also called complementaritydetermining regions (CDRs), which are located within relativelyconserved framework regions (FR). The CDRs are usually aligned by theframework regions, enabling binding to a specific epitope on an antigen.In general, from N-terminal to C-terminal, both light and heavy chainsvariable domains comprise FR1, CDR1, FR2, CDR2, FR3, CDR3 and FR4. Theassignment of amino acids to each domain is, generally, in accordancewith the definitions of Sequences of Proteins of Immunological Interest,Kabat, et al.; National Institutes of Health, Bethesda, Md. ; 5th ed.;NIH Publ. No. 91-3242 (1991); Kabat (1978) Adv. Prot. Chem. 32:1-75;Kabat, et al., (1977) J. Biol. Chem. 252:6609-6616; Chothia, et al.,(1987) J Mol. Biol. 196:901-917 or Chothia, et al., (1989) Nature342:878-883.

As used herein, unless otherwise indicated, “antibody fragment” or“antigen binding fragment” refers to antigen binding fragments ofantibodies, i.e. antibody fragments that retain the ability to bindspecifically to the antigen bound by the full-length antibody, e.g.fragments that retain one or more CDR regions. Examples of antibodybinding fragments include, but are not limited to, Fab, Fab', F(ab′)₂,and Fv fragments; diabodies; linear antibodies; single-chain antibodymolecules, e.g., sc-Fv; nanobodies and multispecific antibodies formedfrom antibody fragments.

The in silico screening method of the invention is performed using a 3Dstructure of the protein, i.e., a three-dimensional model thatrepresents the protein's secondary, tertiary, and/or quaternarystructure. Models of 3D structures useful in the methods of theinvention include X-ray crystal structures, NMR structures, theoreticalprotein structures, structures created from homology modeling, ProteinTomography models, and atomistic models built from electron microscopicstudies. Typically, a 3D structure will provide coordinates for theprotein atoms in three-dimensional space, thus showing the protein foldsand amino acid residue positions.

In some embodiments, the protein structure used in the screening methodof the invention was experimentally determined by X-ray crystallography,and may be determined de novo or obtained from a protein structuredatabase. A variety of databases that contain 3D protein structures arepublicly available. One well-known database is the Protein Data Bankarchive (PDB), which is managed by the Worldwide PDB organization, whosemembers offer various tools for searching visualizing and analyzing PDBdata, including the Research Collaboratory for Structural BioinformaticsProtein Data Bank (RCSB PDB) (described in Berman, H. M et al., NucleicAcids Res. 28(1);235-42 (2000), accessed at ww.rcsb.org/pdb/),Biological Magnetic Resonance Bank (BMRB), Protein Data Bank in Europe(PDBe), and Protein Data Bank Japan (PDBj). Another database thatcontains 3D structures for various proteins is the Research Consortiumfor Structural Bioinformatics (RSCB).

In other embodiments, the protein structure used in the screening methodof the invention has been computationally determined by a molecularmodeling algorithm. Such algorithms typically employ homology modeling,which involves comparing a protein's primary sequence to the known threedimensional structure of a similar protein.

Homology modeling is well known in the art. See, e.g., Xiang, CurrProtein Pept Sci. 2006 June; 7(3):217-227. For antibodies, the structureof antibody variable regions can be obtained accurately using thecanonical structures method (Chothia C and Lesk A M, J. Mol. Biol. 1987,196, 901; Chothia C et al., Nature 1989, 342, 877).

A computationally determined 3D structure of the protein may be obtainedusing commercially available software that employs homology modeling,e.g., MODELLER (Eswar, et al., Comparative Protein Structure ModelingWith MODELLER. Current Protocols in Bioinformatics, John Wiley & Sons,Inc., Supplement 15, 5.6.1-5.6.30, 200.), SEGMOD/ENCAD (Levitt M. J MolBiol 1992; 226:507-533), SWISS-MODEL (Schwede T, Kopp J, Guex N, PeitschM C. Nucleic Acids Research 2003; 31:3381-3385.), 3D-JIGSAW (Bates etal., Proteins: Structure, Function and Genetics, Suppl 2001; 5:39-46),NEST (Xiang, Curr Protein Pept Sci. 2006 June; 7(3): 217-227), andBUILDER (Koehl and Delarue, Curr Opin Struct Biol 1996; 6(2):222-226).In some embodiments, the 3D structure was computationally generatedusing the Protein Modeling and Bioinformatics applications in theMolecular Operating Environment (MOE), which is a comprehensive softwaresystem available from Chemical Computing Group Inc. (CCG) (Montreal,Quebec, Canada).

In some embodiments, homology modeling may be used to assemble fullproteins from known structure fragments, such as when an antibody Fabfragment is modeled onto an Fc fragment, or when a Fab fragment iscreated as a theoretical protein structure and modeled onto a Fcfragment crystal structure. A skilled artisan will understand thatvarious possibilities exist. In one particular embodiment a Fab fragmentmay be modeled onto various antibody Fc structures of different classesor isotypes.

An ab initio model of the protein 3D structure may also be used in thescreening method of the present invention. An “ab initio structure” iscreated directly from the protein primary sequence by simulating theprotein folding process using various equations derived from physicalchemistry (Bonneau and Baker, Annual Review of Biophysics andBiomolecular Structure, 2001, Vol. 30, Pages 173-189; Lesk, Proteins1997, 1:151-166. Suppl; Zemla, et al. Proteins 1997, 1:140-150.Suppl;Ingwall, et al. Biopolymers 1968; 6:331-368; and U.S. Pat. Nos.6,832,162; 5,878,373; 5,436,850; 6,512,981; 7,158,891; 6,377,893; andU.S. patent application Ser. Nos. 9/788,006; 11/890,863; and10/113,219).

In some embodiments, the 3D structure of the protein that has beenobtained experimentally or computationally is processed before applyingthe screening method of the present invention. For example, the obtainedprotein structure may be put through a molecular dynamics simulation toallow the protein side chains to reach a more natural conformation, orthe structure may be allowed to interact with solvent, e.g., water, in amolecular dynamics simulation. This processing is not limited tomolecular dynamics simulation and can be accomplished using anyart-recognized means to determine movement of a protein in a solutionstate (e.g., for an intended liquid formulation) or in a solid state(e.g., for an intended lyophilized formulation). An exemplaryalternative simulation technique is Monte Carlo simulation. Simulationscan be performed using simulation packages or any other acceptablecomputing means. In certain embodiments, simulations to search, probe orsample protein conformational space can be performed on a structuralmodel to determine movement of the protein.

The in silico screening method of the invention may be used to screen avariety of test excipients that are typically examined in formulationscreening studies conducted in vitro. The test excipients may be chosenfrom categories of stabilizing excipients that are commonly added totherapeutic protein formulations to stabilize the protein againstaggregation, which include: buffering agents; amino acids andmodifications thereof; salts; sugars and carbohydrates; surfactants;polymers; and chelators and antioxidants. Choosing test excipients toscreen for a particular protein will typically include consideration ofwhether the desired formulation is to be a liquid or lyophilizedformulation and what is known about the properties of the protein to beformulated.

Representative examples of buffering agents include citrate, acetate,histidine, phosphate, and Tris.

Representative examples of amino acids that have been used in proteinformulations include histidine, arginine, glycine, proline, lysine andmethionine. In some embodiments of the invention, all 20 amino acids areemployed as test excipients.

Representative examples of salts include sodium chloride, potassiumcholoride, and sodium sulfate.

Representative examples of sugars and carbohydrates include sucrose,trehalose, mannitol, sorbitol, glucose and lactose.

Representative examples of surfactants include polysorbate 20 andpolysorbate 80, and alkylsaccharides.

Representative examples of polymers include dextran and polyethyleneglycol.

Representative examples of chelators and anti-oxidants include EDTA,DTPA, methionine, histidine, and ethanol.

In some embodiments, the test excipients will be those that have beenclassified by the FDA as “generally regarded as safe” (GRAS). A databaseof GRAS excipients is available online at the FDA web site (www .fda.gov/Food/FoodIngredientsPackaging/GenerallyRecognizedasSafeGRAS/GRASSubstancesSCOGSDatabase/default.htm).

Other potential test excipients to be screened in the method of theinvention may be selected from the FDA Inactive Ingredient database ofpharmaceutical excipients found in FDA approved drugs, includingparenteral products, which is also accessible on the FDA websiteInactive Ingredient Search for Approved Drug Products(www.accessdata.fda.gov/scripts/cder/iig/index.cfm).

Once one or more test excipients to be screened are chosen, 3Dstructures need to be obtained for the test excipients. The 3Dstructures should be at the same level of resolution as the protein 3Dstructure. Excipient structures may be obtained using resources andtools knows to the skilled artisan, including the scientific literature,structure databases or computationally generated using modeling softwarethat employs energy minimization, molecular dynamics and conformationsearch.

Once the 3D structure for the protein has been obtained and any desiredconformational processing is performed, the 3D structure is protonatedto a desired pH. The desired pH will typically be within a range that isdictated by the stability of the molecule under various formulationconditions, and which may be estimated from pI calculations. Inparticular, it is generally thought that the pH of a protein formulationshould be moved away from the pI of the protein. In some embodiments,the 3D structure of the test excipient is protonated to the same pH asused for the protein structure.

To probe the protein for potential sites of intermolecularself-association and excipient interactions, two different types ofprobe-protein docking simulations are conducted, either simultaneouslyor sequentially, over the entire surface area of at least one region ofinterest on the protonated 3D protein structure. Molecular dockingsimulation software programs are well-known in the art, and include,e.g, the protein docking approach described in Chemical Computing GroupMOE 2015 documentation.

The choice of which protein region(s) to probe will typically depend onthe size and type of protein, as well as the amount of computing timerequired to conduct the simulations. The selected region may include,for example, the part of the protein that is primarily responsible forits biological activity, e.g., the specific binding site for a differentprotein. For example, if the protein is a receptor, then the selectedregion might comprise the binding site for the biological ligand forthat receptor. Similarly, if the protein is a mAb, the selected regionmay comprise any of the 6 CDRs. In some embodiments, the entire surfacearea of the protein 3D structure is probed.

In one type of the probe-protein docking simulations, the probe is theprotonated 3D protein structure. In some embodiments, theprotein-protein docking simulation uses a combination of a quickconfiguration space sweep using Fast-Fourier Transforms (FFTs) toidentify grid-based interactions between two molecules of the protein.This is followed by fast rigid-body interaction energy minimization ofthe top configurations using the Truncated Newton approach along with aresidue based coarse-grained (CG) representation that include thefollowing energy components (van der Waals, electrostatic, and solvationvia the Generalized Born Volume interaction (GB/VI) formalism). Sinceall docking studies use static structures, the resulting improvements inbiophysical properties are applicable to both liquid and lyophilizedformulation.

The docking scores (e.g., binding affinity) for each site ofprotein-protein interaction are determined and ranked. A protein isclassified as having a potential for intermolecular self-association(e.g., aggregation) if there is at least one set of one or more dockingsites for which the sum of the individual docking scores is −3 kcal/molor lower. As will be understood by the skilled artisan, binding affinityincreases as a docking score (AG) gets more negative. Thus, thelikelihood of a protein being susceptible to inter-molecular associationincreases as the summed docking scores for protein-protein dockingsites, which are identified in the docking simulation, become morenegative. If the docking simulation for a selected protein region doesnot generate a set of protein-protein docking sites that satisfy thedocking score requirement, then the user may choose to repeat thesimulation over the surface area of a different region or the entiretyof the protein 3D structure.

In the other type of probe-protein docking simulation, the probe is the3D excipient structure, which if applicable, is protonated to the samepH as used for the protein structure. In some embodiments, models ofamino acid excipients may be created using SMILES (Simplifiedmolecular-input line-entry system) code, built using builder module inChemical Computing Group (CCG) Molecular Operating Environment (MOE) oralternatively with ChemDraw, which was then imported into the CCG MOE.Subsequently, the amino acid structures may be minimized and/orprotonated to achieve the conditions required to supportprotein-excipient docking simulations.

In some embodiments, the excipient-protein docking starts with aconformational analysis of the excipient, followed by placement ofdifferent conformations of the excipient on the selected region of theprotein (e.g., poses) and then calculating an initial scoring of theinteraction energy based on the poses. The initial scoring is furtherrefined using either explicit mechanics force field method or agrid-based energetics method. The final docking score is then calculatedusing any of already established scoring schemes. Top scoring poses arescreened to remove duplicate poses. Any individual excipient-proteindocking site that has a docking score of −3 kcal/mol or lower (i.e.,more negative) is classified as a putative protein-excipient complex. Ifan excipient fails to dock on the protein with a docking score thatmeets this requirement, then the user may choose to perform theprobe-protein docking simulation with another excipient. It iscontemplated that the results of conducting multiple probe-proteinsimulations using different individual excipients may be combined toidentify docking sites where at least two different excipients mayinteract. The user may evaluate the relative docking scores to predictwhether including both excipients in the formulation could be beneficialor detrimental in terms of reducing aggregation.

If the first two docking simulations generate (i) at least one set ofprotein-protein docking sites that meet the docking score requirementstated above (total of individual scores ≤−3 kcal/mol) and (ii) at leastone putative protein-excipient complex, then a third type ofprobe-protein simulation is performed. This simulation is performed overthe entire surface area of each protein-excipient complex using as theprobe at least one molecule of the 3D protein structure. Allorientations of the protein structure are evaluated in the dockingsimulation. The goal of this simulation is to identify any docking sitesfor the protein on the protein-excipient complex that overlap with anyprotein-protein docking site that met the docking score requirements. Asused herein, the two different types of docking sites are deemed to beoverlapping if protein binding to the protein-excipient complex iswithin a 4 angstrom radius of the perimeter defined by theprotein-protein docking site. Such overlapping sites are classified as aprotein-excipient-protein sandwich.

in some embodiments, any or all of protein--excipient, protein-proteinor protein-excipient-protein docking simulations can employ counterions,water, and buffer components,

For each protein-excipient-protein sandwich that is identified, thedocking score for the entire sandwich (ΔG_(PEP)) is compared to thedocking score for the protein-protein docking site in that sandwich(ΔG_(pp)). If at least one sandwich has a ΔG_(PEP) that is less negativethan ΔG_(pp) (i.e., the presence of the excipient in the sandwichreduced the protein-protein binding affinity), then the test excipientis selected as a candidate excipient for use in the formulation.Typically, the difference between ΔG_(PEP) and ΔG_(pp) should be atleast 3 kcal/mol, and preferably at least any of 5, 10, 15, 20, 25 or 30kcal/mol. If no protein-excipient-protein sandwiches meet thisrequirement (e.g., ΔG_(pp)-ΔG_(PEP) is ≥−3 kcal/mol) then the user maychoose to repeat the second and third types of probe-protein simulationsusing a different excipient.

In some embodiments, the ability of each candidate excipient to reduceaggregation of the protein is examined by conducting appropriate invitro experiments that are designed to assess protein aggregation andpreferably protein stability.

EXAMPLE 1

This example illustrates the thermodynamic principles of various dockinginteractions that are evaluated in the screening method of theinvention: protein-protein docking, excipient-protein docking andprotein-excipient-protein (sandwich) docking.

EXAMPLE 2

To demonstrate the utility of the in silico screening method of theinvention to identify sites of intermolecular protein interactions aswell as screen test excipients that may reduce such interactions, fourdifferent mAbs were used as model proteins and 20 amino acids were usedas test excipients. The CNTO607 mAb was used as a model of a protein forwhich the 3D crystal structure was publicly available and for whichsignificant aggregation problems had been identified (Bethea et al.,supra). The other model proteins were three mAbs whose 3D structures hadnot been experimentally determined and whose propensity forself-association was unknown. The approach was a direct application ofprobe-protein docking simulations to determine protein-protein dockingsites, protein-excipient docking sites and relative docking scores.

Excipient Structures

The specific excipients evaluated in this assessment were the 20standard amino acids (Histidine, Alanine, Isoleucine, Arginine, Leucine,Asparagine, Lysine, Aspartic acid, Methionine Cysteine, Phenylalanine,Glutamic acid, Threonine, Glutamine, Tryptophan, Glycine, Valine,Proline, Serine, Tyrosine). Structural models of these amino acids werecreated using SMILES (Simplified molecular-input line-entry system)code, built using builder module in MOE or alternatively with ChemDraw,which were then imported into Chemical Computing Group (CCG) MolecularOperating Environment (MOE). Subsequently, the amino acid structureswere minimized and/or protonated to achieve the conditions required tosupport protein-excipient docking simulations.

Protein Structures

The 3D structure for CNTO607 was obtained from RCSB (Research Consortiumfor Structural Bioinformatics) in pdb format (reference file ID 3G6A).Parameterization of this mAb was achieved using AMBER 10 EHT. Thestructure was subsequently prepared for docking by fixing any missingstructure elements, and energy minimization (to a gradient RMS of 0.1kcal/mol/Å²) to remove steric clashes. For antibodies for which noexperimentally determined crystal structure was available (MAB 1, 2, and3), homology modeling was used to computationally determine the antibodystructure.

Protein-Excipient Docking

Amino acid test excipients were docked with a model protein utilizingthe following approach. Direct application of excipient docking todetermine sites of interaction and relative docking scores (interactionenergy) for excipients. Docking was performed using the protein-proteindocking approach described in Chemical Computing Group MOE 2015documentation.

Specific sequence of events included 1) placement which generates posesform a structure library, 2) placement scoring (utilizing TriangleMatcher and scored with London AG), 3) refinement to minimize structures(using Rigid Receptor and scored by GBVI/WSA AG), and 4) final scoringwhich eliminates duplicate poses.

Highest docking score/binding sites were then identified by rankordering the docking scores. The highest affinity sites were theninterrogated via use of molecular dynamics simulation (˜60 ns) tovalidate protein-excipient interaction and to confirm that the excipientdid not dissociate during the simulation.

The docking was repeated for multiple excipients with each modelprotein. The highest binding affinities for the CNTO607 as the modelprotein and the 20 amino acids are presented in FIG. 3 , panel B. All 20AA had protein-excipient docking scores (ΔG_(PE)) of 31 3 kcal/mol orlower and hence could be considered as putative protein-excipientcomplexes for further analysis. An example of protein-excipient dockingis present in FIG. 2 , panels B and C. Surface rendering of theinteraction sites for all twenty amino acids is presented in FIG. 2 ,panels E and F, and in FIG. 3 , panel C and these sites correspond tothe interaction energies plotted in FIG. 3 , panel B.

Protein-Protein Docking

Interaction sites for protein intermolecular self-association areillustrated in FIG. 2 . The docking algorithm (designated asprotein-protein docking in Chemical Computing Group MOE 2015 package)was employed for this example. Details of the algorithm includedefinition of the receptor site and ligand site, creating coarse grainrigid-body representation of structures, creating grid basedrepresentation of receptor field, generating a set of rotations for theligand, calculating grid-based energy for all translations using FFTconvolutions, filter conformation followed by minimization of the poses,additional minimization of the poses to account for solvationfree-energy using GBVI and final ranking according to energy.

The crystal structure of the protein-protein interaction for CNTO607 ispresented in FIG. 2 , panels A and D, in FIG. 3 , panel A, and in FIG. 4, panel A. Using the crystal structure as a reference, protein-proteindocking was performed using the MOE protocol and the highest dockingscore structure was compare to the original crystal structure. FIG. 4 ,panel C shows a ribbon diagram of the crystal structure in blue overlaidwith the docking pose in red. The two structures look very similar andevaluation of the highest interaction sites reveal that both structuresemploy a conserved set of residues to achieve this interaction between aCDR and the framework region (FIG. 4 , panels B and D). The finalcalculated interaction energies and orientation are slightly differentbut are consistent with crystallographically determined contacts. Thus,the highest energy contacts are conserved between crystal structure (C)and modeling results (D).

Protein-Excipient-Protein Docking

Using the highest affinity docking score site from the protein-excipientdocking simulation a protein-excipient-protein docking simulation wasperformed to evaluate the impact of amino acid excipients on disruptingthe original protein-protein docking conformation and thecrystallographic conformation as depicted in FIG. 4 . Compared to theoriginal orientation of CNTO607 self-association shown in blue, thehighest docking pose adopted a conformation very different from what hasbeen observed in the initial protein-protein docking simulation and wasconfirmed experimentally (FIG. 5 ).

Application to Model Antibodies with Unknown Self-Association Properties

Following the same algorithm as described above, separateprotein-protein docking and protein-excipient docking simulations wereperformed. The protein-protein docking defined the Fc region of theprotein as the receptor site and the CDR region of the Fab as theligand. Following the same algorithm, a site of interaction between aCDR on a Fab molecule and a framework region on the full-length mAb wasidentified (FIG. 6 , left panel). The highest protein-protein dockingscore utilized a tangential interaction of the CDR and partial frameworkregion overlap as depicted in FIG. 6 . This indicates that the Fc regionof this mAb is not involved in intermolecular self-association with theDCR. Further, the region of overlap coincides with the region of the mAbwhere some of the amino acid excipients interact as determined byprotein-excipient docking simulation (FIG. 7 , panel A). Interestingly,putative excipient protein-excipient complexes were identified for only13 of the 20 amino acids (R, N, D, Q, E, H, I, L, K, M, F, W, and Y)with Mab A, while all 20 amino acids produced putative excipientprotein-excipient complexes for Mab B and Mab C.

1. A method of in silico screening of test excipients to selectcandidate excipients for reducing aggregation of a protein in aformulation, comprising the steps of: a) obtaining a three-dimensional(3D) structure of the protein and a 3D structure of at least one testexcipient, wherein the 3D protein structure and the 3D excipientstructure are at the same level of resolution; b) selecting at least oneregion of the 3D protein structure to probe for potential sites ofnon-specific inter-molecular interactions between monomers of theprotein and between monomers of the protein and at least one molecule ofthe test excipient; c) protonating the selected protein region to adesired pH; d) conducting a first probe-protein docking simulation overthe entire surface area of the selected protein region, using as a firstprobe the 3D protein structure, to identify a set of one or moreprotein-protein docking sites whose docking scores in total equal −3kcal/mol or lower; e) conducting a second probe-protein dockingsimulation over the entire surface area of the selected protein region,using as a second probe the 3D excipient structure, to identify anyexcipient-protein docking sites and classifying for further analysis asa putative protein-excipient complex each identified site that has adocking score that is −3 kcal/mol or lower, wherein each of the firstand second probe-protein docking simulations may be conducted in eitherorder or simultaneously; f) conducting a third probe-protein dockingsimulation over the entire surface area of each protein-excipientcomplex classified in step (e), using as a third probe at least onemolecule of the 3D protein structure, and at all orientations of theprotein, to identify each docking site for the protein on theprotein-excipient complex that overlaps with any of the protein-proteindocking sites identified in step (d) and classifying each overlappingsite as a protein-excipient-protein sandwich; and g) selecting the testexcipient as a candidate excipient to reduce aggregation of the proteinin the formulation if at least one protein-excipient-protein sandwichclassified in step (f) has a docking score that represents a lowerbinding affinity than the docking score for the protein-protein dockingsite in the sandwich.
 2. The method of claim 1, wherein the resolutionis at the atomic level or is at an intermediate level.
 3. The method ofclaim 1, wherein the resolution is at the atomic level and the methodfurther comprises conducting, before step (f), a molecular dynamicssimulation on each putative protein-excipient complex classified in step(e) and retaining for use in step (f) each protein-excipient complexthat has a binding free energy of −3 kcal/mol or lower.
 4. The method ofclaim 1, wherein the resolution is at the atomic level andpre-determined (i) computationally by a molecular modeling algorithm or(ii) experimentally by X-ray crystallography, nuclear magnetic resonanceor cryo-electron microscopy.
 5. The method of claim 1, wherein the 3Dstructure of the protein is obtained from a protein structure database.6. The method of claim 5, wherein the protein structure ispre-determined experimentally and the database is the Protein Data Bank(PDB).
 7. The method of claim 4, wherein the protein is a candidateantibody therapeutic and its structure is pre-determined by acomputational modeling process that comprises the steps of: a) providingamino acid sequences for framework regions (FR) and complementaritydetermining regions (CDR) in a set of antibody Fabs for which the 3Dstructure has been experimentally determined; b) aligning, for a firstFR in the candidate antibody and the corresponding FR in each providedFab, the amino acid sequences to identify each Fab FR that shares atleast 85% sequence identity with the candidate antibody FR and selectingthe 3D structure of the Fab FR that has the highest sequence identitywith the candidate antibody FR for use as the structural model for thefirst FR c) repeating step (b) for each FR in the candidate antibody; d)aligning, for a first CDR in the candidate antibody and correspondingCDR in each provided Fab, the CDR amino acid sequences and selecting,for use as the structural model for the FR of interest, the 3D structureof the Fab CDR that has about the same length as the candidate antibodyCDR and is likely to form a higher ordered structure; e) repeating step(d) for each CDR in the candidate antibody; f) grafting together the 3Dstructures of the selected Fab FRs and Fab CDRs and mutating the FR andCDR amino acid sequences in the grafted structure to be the same as inthe candidate antibody to derive a Fab structural model of the candidateantibody; g) superimposing two copies of the Fab structural model ontothe structure of a full length IgG antibody with an Fc of the sameisotype as the candidate antibody; h) joining the two Fab copies withthe Fc using a linker that is modeled to provide the appropriatedisulfide bonds and thereby derive a full-length structural model of thecandidate antibody; i) performing a molecular dynamics simulation on thefull-length structural model that packs the side chains and eliminatesany clashes in the structure to generate an energy minimized structureof the candidate antibody.
 8. The method of claim 1, wherein theexcipient is an amino acid.
 9. The method of claim 1, wherein theformulation is a liquid formulation and the protein is an antibody. 10.The method of claim 9, wherein the excipient is an amino acid.
 11. Themethod of claim 1, wherein the protein is an antibody and the first andsecond probe docking simulations are performed for at least two antibodyregions selected from the group consisting of all light chain CDRs, allheavy chain CDRs, each Fab, the Fc region and the entire antibody. 12.The method of claim 1, wherein the 3D excipient structure is protonatedto the same pH as the selected protein region.
 13. The method of claim1, further comprising examining in vitro the ability of the candidateexcipient to reduce aggregation of the protein in the formulation.
 14. Amachine-readable medium for carrying out the method of claim 1,comprising machine-readable instructions encoded thereon which, whenexecuted by a processor, cause a machine having or linked to theprocessor to execute the method.
 15. A computer system comprising themachine-readable medium of claim 14 and a user interface capable ofreceiving the 3D structures of the protein and excipients and userselected criteria applied in one or more steps of the method.